-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Store arrays offsets for ip fields natively with synthetic source #122999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store arrays offsets for ip fields natively with synthetic source #122999
Conversation
efc2fc8 to
87e8f8d
Compare
Follow up of elastic#113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.
87e8f8d to
62a3b50
Compare
|
Hi @martijnvg, I've created a changelog YAML for you. |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| if (address != null) { | ||
| indexValue(context, address); | ||
| } | ||
| if (offsetsFieldName != null && context.isImmediateParentAnArray() && context.getRecordedSource() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using context.canAddIgnoredField instead of context.getRecordedSource. We had this comment in the previous PR, let's update KeywordFieldMapper too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 2a72a77
| throw e; | ||
| } | ||
| } | ||
| if (address != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is address set to null, ever? It's set to nullValue or value, iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullValue is null by default, so address can be set to null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then, should we be comparing to nullValue here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point: 77c30b1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that change fails the build. If we only check nullValue then also value needs to be checked...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed another commit
|
|
||
| long ord = ords[offset]; | ||
| BytesRef c = valueDocValues.lookupOrd(ord); | ||
| // This is keyword specific and needs to be updated once support is added for other field types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need a comment, as the value is written as utf8. Maybe something about what the converter is expected to return, in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 86a1aa0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we enforce utf8 if this goes to xcontentbuilder anyway that has a generic value method that takes anything?
|
|
||
| @Override | ||
| protected String randomSyntheticSourceKeep() { | ||
| return "all"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testSyntheticSourceKeepArrays(...) unit test assumes ignored source and therefor assumes nested leaf arrays are maintained, which doesn't any more with the new offset encoding.
This was required for KeywordFieldMapperTests too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we update the test to check the new behavior as an alternative, if keep is arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually want to merge OffsetDocValuesLoaderTestCase into MapperTestCase and then update this test to test the new behaviour. Let's do this in a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding ip support to data generation tests :)
I will do that in a follow up. |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Backporting elastic#122999 to 8.x branch. Follow up of elastic#113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.
…ce (#123405) * [8.x] Store arrays offsets for ip fields natively with synthetic source Backporting #122999 to 8.x branch. Follow up of #113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source. * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
…stic#124594) This patch builds on the work in elastic#122999 and elastic#113757 to natively store array offsets for numeric fields instead of falling back to ignored source when `source_keep_mode: arrays`.
#124594) | Fix ignores malformed testcase (#125337) | Fix offsets not recording duplicate values (#125354) (#125440) * Natively store synthetic source array offsets for numeric fields (#124594) This patch builds on the work in #122999 and #113757 to natively store array offsets for numeric fields instead of falling back to ignored source when `source_keep_mode: arrays`. (cherry picked from commit 376abfe) # Conflicts: # server/src/main/java/org/elasticsearch/index/IndexVersions.java # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java * Fix ignores malformed testcase (#125337) Fix and unmute testSynthesizeArrayRandomIgnoresMalformed (cherry picked from commit 2ff03ac) # Conflicts: # muted-tests.yml * Fix offsets not recording duplicate values (#125354) Previously, when calculating the offsets, we just compared the values as-is without any loss of precision. However, when the values were saved into doc values and loaded in the doc values loader, they could have lost precision. This meant that values that were not duplicates when calculating the offsets could now be duplicates in the doc values loader. This interfered with the de-duplication logic, causing incorrect values to be returned. My solution is to apply the precision loss before calculating the offsets, so that both the offsets calculation and the SortedNumericDocValues de-duplication see the same values as duplicates. (cherry picked from commit db73175)
#125529) (#125596) This patch builds on the work in #113757, #122999, and #124594 to natively store array offsets for boolean fields instead of falling back to ignored source when `synthetic_source_keep: arrays`. (cherry picked from commit af1f145) # Conflicts: # server/src/main/java/org/elasticsearch/index/IndexVersions.java # server/src/main/java/org/elasticsearch/index/mapper/BooleanFieldMapper.java
… source (#125709) (#125746) This patch builds on the work in #113757, #122999, #124594, and #125529 to natively store array offsets for unsigned long fields instead of falling back to ignored source when synthetic_source_keep: arrays. (cherry picked from commit 689eaf2) # Conflicts: # server/src/main/java/org/elasticsearch/index/IndexVersions.java # x-pack/plugin/mapper-unsigned-long/src/main/java/org/elasticsearch/xpack/unsignedlong/UnsignedLongFieldMapper.java
…stic#124594) This patch builds on the work in elastic#122999 and elastic#113757 to natively store array offsets for numeric fields instead of falling back to ignored source when `source_keep_mode: arrays`.
elastic#125529) This patch builds on the work in elastic#113757, elastic#122999, and elastic#124594 to natively store array offsets for boolean fields instead of falling back to ignored source when `synthetic_source_keep: arrays`.
… source (elastic#125709) This patch builds on the work in elastic#113757, elastic#122999, elastic#124594, and elastic#125529 to natively store array offsets for unsigned long fields instead of falling back to ignored source when synthetic_source_keep: arrays.
…source (elastic#125793) This patch builds on the work in elastic#113757, elastic#122999, elastic#124594, elastic#125529, and elastic#125709 to natively store array offsets for scaled float fields instead of falling back to ignored source when synthetic_source_keep: arrays.
…source (#125793) (#125891) This patch builds on the work in #113757, #122999, #124594, #125529, and #125709 to natively store array offsets for scaled float fields instead of falling back to ignored source when synthetic_source_keep: arrays. (cherry picked from commit 71e74bd) # Conflicts: # server/src/main/java/org/elasticsearch/index/IndexVersions.java
Follow up of #113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.